Final Project¶

Who Looks Like Me ?

Prepared by

  • Nikhil Shankar C S
  • Sreehari Prathap
A machine learning model built on InceptionV3 trained on imagenet database used for predicting similarity of a face with celebrity faces.¶

Data Collection¶

  • Initially we used a hugging face dataset in parquet format and parsed it and created images out of the file. This was a great database nut after some trials we found out that for our purpose it wont be helpful.

  • It contained 18000+ images for 1000+ celebrities. Around 18 per person.

  • After realizing this we found another database in Kaggle with 18k+ images for 100 celebrities and we are going to use that for our purpose.

HuggingFace Link : https://huggingface.co/datasets/ares1123/celebrity_dataset?sql=--+The+SQL+console+is+powered+by+DuckDB+WASM+and+runs+entirely+in+the+browser.%0A--+Get+started+by+typing+a+query+or+selecting+a+view+from+the+options+below.%0ASELECT+*+FROM+train+LIMIT+10%3B

Kaggle Link : https://www.kaggle.com/datasets/hereisburak/pins-face-recognition

EDA¶

In [7]:
from EDA import EDA

eda = EDA("../dataset", "EDA")
eda.calculate_eda()
c:\Users\DELL\Desktop\Conestoga\AIML\FOML-FinalProject\WhoLooksLikeMe\WhoLooksLikeMe-Model\WLLM-Nikhil-Model-Classes-2\EDA.py:97: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=image_counts.index, y=image_counts.values, palette='viridis')
Personality with minimum images: lionel messi (86 images)
Personality with maximum images: leonardo dicaprio (237 images)
c:\Users\DELL\Desktop\Conestoga\AIML\FOML-FinalProject\WhoLooksLikeMe\WhoLooksLikeMe-Model\WLLM-Nikhil-Model-Classes-2\EDA.py:177: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=list(type_counts.keys()), y=list(type_counts.values()), palette='Set2')
All EDA results have been saved in the output directory.

Train and Test Setup¶

We created 2 classes

  • DataPreparation.py
  • TrainTestSplitter.py

These two classes will do the following

  • Select random N folders from the oriiginal dataset path and copy it to output folder Datapreparation class also accepts a range wherein you can choose which range of folders you want to move for training and testing.
  • TrainTestSplitter accepts a folder path essentially the output path of datapreparation and then splits the files into train and test folders. The traintestsplitter accepts a parameter to decide how many images should be moved to the test folder.
In [ ]:
from TrainTestSplitter import TrainTestSplitter
from DataPreparation import DataPreparation
from WLLMSimilarityCalculatorAdvancedCorrected import SimilarityCalculatorAdvancedCorrected2
from WLLMModel import WLLMModel
import os
from datetime import datetime
from WLLMModelLoader import WLLMModelLoader


#Dont edit this
modelname_prefix = "WLLM-Model-Selected"
username = input("Enter your name")
formatted_date = datetime.now().strftime("%m-%d-%H-%M")
sample_class_range= (50,102)
sample_class_range_name = "L2"

model_name = f"{modelname_prefix}-{username}-{sample_class_range_name}-{formatted_date}"

#TODO Point this to your repository of all images
original_data_dir = "../../../DontEditThese/Dataset3"

model_info_root_save_dir = "../SavedTrainingData/savedmodels"
dataset_dir = f"../TrainingDataImages/{model_name}"  # Path to your dataset folder
savedmodels_dir = f"{model_info_root_save_dir}/{model_name}"
embeddings_dir = f"{model_info_root_save_dir}/{model_name}/embeddings"

# Check if the folder exists; if not, create it
if not os.path.exists(dataset_dir):
    os.makedirs(dataset_dir)

# Check if the folder exists; if not, create it
if not os.path.exists(embeddings_dir):
    os.makedirs(embeddings_dir)

# Check if the folder exists; if not, create it
if not os.path.exists(savedmodels_dir):
    os.makedirs(savedmodels_dir)


#Copy sample class number of images from original repo to dataset_dir
data_preparation = DataPreparation(original_data_dir, dataset_dir, sample_range=sample_class_range)
selected_classes = data_preparation.prepare_data()

#Split data to train and test
splitter = TrainTestSplitter(dataset_dir)
splitter.create_train_test_split(test_file_count=4)

Please note that the ipynb file was created after all training and testing was done after a lot of experimentation. So some of the codeblocks are not executed due to time constraints.

We have saved the plots and calculated the time taken and saved it in csv files which will be attached and shown in the ipynb file.

Model Architecture¶

The final model we chose is the InceptionV3 base model trained on ImageNet dataset.

  • Optimizer : Adam
  • Learning rate : 0.001
  • Top was not included.
  • 100 epochs
  • stop_early: true
  • patience : 10
  • monitoring metric : val_loss
  • Unfrozen layers : Last 15 layers
  • Trainable parameters : 501,044
  • Total parameters: 21,909,332

The output is being read from a saved file.

In [ ]:
file_path = '../../WhoLooksLikeMe-Model/SavedTrainingData/savedmodels/WLLM-Model-Nikhil-L2-12-10-01-24/ModelSummary.txt'
with open(file_path, 'r', encoding="utf-8") as file:
    for line in file:
        print(line.strip())
Model Architecture Head¶
  • We have two additional layers added on top of the base.
  • Embedding Layer : Used to extract out the embeddings of each person and save it for cosine similarity calculation. This layer is a flattened layer.
  • Classsification Layer : This layer is a dense layer with Number of classes of neurons and uses softmax for getting the probabilities for each class.

Training - Classification Model¶

In [ ]:
## Training the model
model = WLLMModel(dataset_dir, savedmodels_dir)
model.train_model(output_dir=savedmodels_dir, epochs=100, batch_size=32)

alt text

  • We trained 4 models

  • 3 models were created with 10 celebrities trained per model

  • 4th one a single model trained on 50 celebrities together.

Confusion Matrix¶

alt text

Vertically seeping through the plot we can identify which personality is getting predicted more.

Horizontally seeping through the plot we can identify who is getting predicted wrong.

In [20]:
from ClassificationModel import ClassificationModel


large_model_path = "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-L2-12-10-01-24/best_model.keras"
names_csv_path = "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-L2-12-10-01-24/WLLM-Model-Nikhil-L2-12-10-01-24.csv"
image_dataset_path = "../dataset"
predictions_save_path = "../predictions/model52"

classificationModel = ClassificationModel(large_model_path, names_csv_path)
Embedding model created using layer: 'embedding'.
In [2]:
classificationModel.evaluate_and_print_images("../TrainingDataImages/WLLM-Model-Nikhil-L2-12-10-01-24/test")
Found 208 images belonging to 52 classes.
c:\Users\DELL\Desktop\Conestoga\AIML\FOML-FinalProject\WhoLooksLikeMe\venv\tensorflow_facenet\Lib\site-packages\keras\src\trainers\data_adapters\py_dataset_adapter.py:121: UserWarning: Your `PyDataset` class should call `super().__init__(**kwargs)` in its constructor. `**kwargs` can include `workers`, `use_multiprocessing`, `max_queue_size`. Do not pass these arguments to `fit()`, as they will be ignored.
  self._warn_if_super_not_called()
52/52 ━━━━━━━━━━━━━━━━━━━━ 15s 246ms/step
Out[2]:
accuracy precision recall f1-score support
johnny depp 0.50 0.500000 0.500000 0.500000 4.000000
josh radnor 0.75 0.600000 0.750000 0.666667 4.000000
katharine mcphee 0.50 1.000000 0.500000 0.666667 4.000000
katherine langford 1.00 1.000000 1.000000 1.000000 4.000000
keanu reeves 1.00 1.000000 1.000000 1.000000 4.000000
kiernen shipka 0.75 1.000000 0.750000 0.857143 4.000000
krysten ritter 1.00 0.800000 1.000000 0.888889 4.000000
leonardo dicaprio 0.50 1.000000 0.500000 0.666667 4.000000
lili reinhart 0.75 0.428571 0.750000 0.545455 4.000000
lindsey morgan 0.75 0.600000 0.750000 0.666667 4.000000
lionel messi 0.50 1.000000 0.500000 0.666667 4.000000
logan lerman 0.25 0.500000 0.250000 0.333333 4.000000
madelaine petsch 0.50 0.666667 0.500000 0.571429 4.000000
maisie williams 0.25 0.500000 0.250000 0.333333 4.000000
margot robbie 0.50 0.400000 0.500000 0.444444 4.000000
maria pedraza 0.50 0.500000 0.500000 0.500000 4.000000
marie avgeropoulos 1.00 1.000000 1.000000 1.000000 4.000000
mark ruffalo 0.75 0.428571 0.750000 0.545455 4.000000
mark zuckerberg 1.00 1.000000 1.000000 1.000000 4.000000
megan fox 0.75 1.000000 0.750000 0.857143 4.000000
melissa fumero 0.75 0.750000 0.750000 0.750000 4.000000
miley cyrus 0.50 0.285714 0.500000 0.363636 4.000000
millie bobby brown 0.50 0.500000 0.500000 0.500000 4.000000
morena baccarin 0.75 1.000000 0.750000 0.857143 4.000000
morgan freeman 1.00 1.000000 1.000000 1.000000 4.000000
nadia hilker 1.00 0.666667 1.000000 0.800000 4.000000
natalie dormer 0.50 0.400000 0.500000 0.444444 4.000000
natalie portman 0.50 0.666667 0.500000 0.571429 4.000000
neil patrick harris 0.75 0.750000 0.750000 0.750000 4.000000
pedro alonso 0.75 0.750000 0.750000 0.750000 4.000000
penn badgley 0.75 0.750000 0.750000 0.750000 4.000000
rami malek 0.75 0.500000 0.750000 0.600000 4.000000
rebecca ferguson 0.75 1.000000 0.750000 0.857143 4.000000
richard harmon 0.75 0.750000 0.750000 0.750000 4.000000
rihanna 0.75 1.000000 0.750000 0.857143 4.000000
robert de niro 0.75 1.000000 0.750000 0.857143 4.000000
robert downey jr 1.00 0.666667 1.000000 0.800000 4.000000
sarah wayne callies 1.00 0.800000 1.000000 0.888889 4.000000
scarlett johansson 0.50 0.500000 0.500000 0.500000 4.000000
selena gomez 0.50 0.500000 0.500000 0.500000 4.000000
shakira isabel mebarak 0.50 0.666667 0.500000 0.571429 4.000000
sophie turner 1.00 0.800000 1.000000 0.888889 4.000000
stephen amell 0.75 1.000000 0.750000 0.857143 4.000000
taylor swift 0.25 0.500000 0.250000 0.333333 4.000000
tom cruise 0.50 0.666667 0.500000 0.571429 4.000000
tom ellis 0.75 0.600000 0.750000 0.666667 4.000000
tom hardy 1.00 1.000000 1.000000 1.000000 4.000000
tom hiddleston 0.25 1.000000 0.250000 0.400000 4.000000
tom holland 0.25 0.250000 0.250000 0.250000 4.000000
tuppence middleton 1.00 1.000000 1.000000 1.000000 4.000000
ursula corbero 1.00 0.571429 1.000000 0.727273 4.000000
wentworth miller 0.75 0.750000 0.750000 0.750000 4.000000
accuracy 0.00 0.692308 0.692308 0.692308 0.692308
macro avg 0.00 0.730082 0.692308 0.689867 208.000000
weighted avg 0.00 0.730082 0.692308 0.689867 208.000000
Single image prediction¶
In [22]:
import time

start_time = time.perf_counter()  # High-resolution start time
predicted_personality, scores = classificationModel.predict_single_image("../test_images/tom.jpg")
end_time = time.perf_counter()  # High-resolution end time

elapsed_time_ms = (end_time - start_time) * 1000
print(f"Time taken for single prediction: {elapsed_time_ms:.3f} ms")
display(predicted_personality)
scores['Similarity Percentage'] = scores['Prediction Score'] * 100
scores_sorted = scores.sort_values(by='Similarity Percentage', ascending=False)

display(scores_sorted)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
Time taken for single prediction: 200.044 ms
'tom hiddleston'
Class Name Class Index Prediction Score Similarity Percentage
0 tom hiddleston 47 9.609074e-01 9.609074e+01
1 pedro alonso 29 2.442909e-02 2.442909e+00
2 josh radnor 1 7.978412e-03 7.978413e-01
3 mark ruffalo 17 3.342980e-03 3.342980e-01
4 johnny depp 0 8.858606e-04 8.858606e-02
5 robert downey jr 36 8.476512e-04 8.476512e-02
6 lionel messi 10 5.609069e-04 5.609069e-02
7 robert de niro 35 3.321819e-04 3.321819e-02
8 penn badgley 30 1.682654e-04 1.682653e-02
9 tom hardy 46 1.194267e-04 1.194267e-02
10 tom holland 48 8.350729e-05 8.350729e-03
11 richard harmon 33 8.192172e-05 8.192171e-03
12 tom ellis 45 6.404772e-05 6.404771e-03
13 rami malek 31 6.239067e-05 6.239067e-03
14 neil patrick harris 28 6.203799e-05 6.203800e-03
15 tom cruise 44 2.510485e-05 2.510485e-03
16 kiernen shipka 5 1.384289e-05 1.384289e-03
17 maisie williams 13 5.641338e-06 5.641338e-04
18 miley cyrus 21 5.505120e-06 5.505120e-04
19 keanu reeves 4 5.402623e-06 5.402623e-04
20 tuppence middleton 49 4.416541e-06 4.416541e-04
21 millie bobby brown 22 3.819254e-06 3.819254e-04
22 selena gomez 39 1.614891e-06 1.614891e-04
23 logan lerman 11 1.366841e-06 1.366841e-04
24 morgan freeman 24 1.243366e-06 1.243366e-04
25 maria pedraza 15 8.610864e-07 8.610864e-05
26 mark zuckerberg 18 8.558867e-07 8.558867e-05
27 leonardo dicaprio 7 7.843571e-07 7.843571e-05
28 katherine langford 3 7.189742e-07 7.189742e-05
29 nadia hilker 25 6.466557e-07 6.466558e-05
30 scarlett johansson 38 5.123255e-07 5.123255e-05
31 wentworth miller 51 4.419033e-07 4.419033e-05
32 madelaine petsch 12 2.825000e-07 2.825000e-05
33 natalie portman 27 2.564185e-07 2.564185e-05
34 stephen amell 42 1.437359e-07 1.437359e-05
35 natalie dormer 26 1.192505e-07 1.192505e-05
36 lili reinhart 8 7.299278e-08 7.299278e-06
37 sarah wayne callies 37 4.732647e-08 4.732648e-06
38 lindsey morgan 9 3.512869e-08 3.512868e-06
39 ursula corbero 50 3.491779e-08 3.491779e-06
40 rebecca ferguson 32 1.687709e-08 1.687709e-06
41 sophie turner 41 1.352530e-08 1.352530e-06
42 katharine mcphee 2 1.139787e-08 1.139787e-06
43 melissa fumero 20 1.013638e-08 1.013638e-06
44 margot robbie 14 8.310916e-09 8.310915e-07
45 taylor swift 43 6.768491e-09 6.768491e-07
46 shakira isabel mebarak 40 6.251693e-09 6.251693e-07
47 krysten ritter 6 4.397424e-09 4.397424e-07
48 marie avgeropoulos 16 3.596515e-09 3.596515e-07
49 megan fox 19 1.266936e-09 1.266936e-07
50 rihanna 34 5.250642e-10 5.250642e-08
51 morena baccarin 23 3.322136e-10 3.322135e-08

Note here that the similariy percentage is shared between all the n classes. In strong similarity cases we can expect a similarity score of X% but it also means that the remaining will be shared between the N-1 classes and this is expected from a softmax layer output. We will discuss more about this as we go forward.

Training Time Results - For Classification Model¶

In [6]:
import pandas as pd
df = pd.read_csv("../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-L2-12-10-01-24/TrainingTimeResults.csv")
display(df)
Unnamed: 0 Training Classes TrainingTime
0 0 52 0 days 07:25:19.138662

Subset Models - C¶

  • Trained on a subset of 10 peronalities.
  • Same architecture as the classification model we saw earlier.

alt text

Confusion Matrix¶

alt text

In [9]:
from ClassificationModel import ClassificationModel


large_model_path = "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-C-12-09-22-11/best_model.keras"
names_csv_path = "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-C-12-09-22-11/WLLM-Model-Nikhil-C-12-09-22-11.csv"
image_dataset_path = "../dataset"
predictions_save_path = "../predictions/modelC"

classificationModel = ClassificationModel(large_model_path, names_csv_path)
Embedding model created using layer: 'embedding'.
In [10]:
classificationModel.evaluate_and_print_images("../TrainingDataImages/WLLM-Model-Nikhil-C-12-09-22-11/test")
Found 40 images belonging to 10 classes.
c:\Users\DELL\Desktop\Conestoga\AIML\FOML-FinalProject\WhoLooksLikeMe\venv\tensorflow_facenet\Lib\site-packages\keras\src\trainers\data_adapters\py_dataset_adapter.py:121: UserWarning: Your `PyDataset` class should call `super().__init__(**kwargs)` in its constructor. `**kwargs` can include `workers`, `use_multiprocessing`, `max_queue_size`. Do not pass these arguments to `fit()`, as they will be ignored.
  self._warn_if_super_not_called()
10/10 ━━━━━━━━━━━━━━━━━━━━ 5s 246ms/step
Out[10]:
accuracy precision recall f1-score support
chris evans 0.75 0.750000 0.75 0.750000 4.0
chris hemsworth 0.75 1.000000 0.75 0.857143 4.0
chris pratt 1.00 0.800000 1.00 0.888889 4.0
christian bale 1.00 0.666667 1.00 0.800000 4.0
cristiano ronaldo 0.75 1.000000 0.75 0.857143 4.0
danielle panabaker 1.00 1.000000 1.00 1.000000 4.0
dominic purcell 0.75 1.000000 0.75 0.857143 4.0
dwayne johnson 1.00 1.000000 1.00 1.000000 4.0
eliza taylor 1.00 1.000000 1.00 1.000000 4.0
elizabeth lail 1.00 1.000000 1.00 1.000000 4.0
accuracy 0.00 0.900000 0.90 0.900000 0.9
macro avg 0.00 0.921667 0.90 0.901032 40.0
weighted avg 0.00 0.921667 0.90 0.901032 40.0
Training time results - Model C¶
In [12]:
import pandas as pd
df = pd.read_csv("../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-C-12-09-22-11/TrainingTimeResults.csv")
display(df)
Unnamed: 0 Training Classes TrainingTime
0 0 10 0 days 01:06:31.188613

Comparison¶

  • The bigger clasification model was able to achieve ~71% accuracy.
  • The smaller models trained with subset of personalities were able to achieve ~86% accuracy.
  • The time taken to train 52 persons for the classification model was around 8 hours.
  • Time taken for smaller models were around 1 hour for 10 persons.

Why this is important ?

  • Training the entire model with 100 celebrities with 180 images on average per person would take approximately around 20 hours on a 16GB machine.
  • Whenever we want to add some more celebrities it means that we need to retrain the entire dataset again which would again accumulate and increase the training time and thereby resource consumption.
  • Another thing is the similarity percentage output that we got. The similarity percentage will be shared by the N classes. So if a celebrity gets a high percentage it means others would have to share the remaining percentage between them.
  • Our model is not looking for accuracy here. In the sense that we want the top 5 celebrities a face is most similar to and we dont want to restrict the model in trying to predict exactly one personality who is most similar.
  • To solve this issue we will be using approach of Cosine Similarity and Embeddings. Which we will discuss now.

Embeddings and Cosine Similarity based Approach¶

  • We had created a Flattened layer in our model named embedding while creating the model.
  • The output shape is (,2048) for this layer.
  • After training the model. We will be running the model against the test split and create embeddings for each personality and save them as a numpy file.
  • The embedding is basically the output of the embedding layer.
  • So our three subset models lets call them Model-A, Model-B and Model-C each was trained for lets say P1-P10 , P11-P20 and P21-P30 personalities respectively.
  • Now we create embeddings for P1-P10 using trained Model-A and save it as Emb1-Emb10 and repeat for all 3 of them.
  • After this step we have 3 models and 3 sets of 10 embeddings each.
  • During precition time we get an image for predicting. Lets call it Unknown-Image.
  • First we create the embedding for the unknown-image from all three models. So now we have 3 embeddings for the unknown image taken from the 3 models embedding output layer.
  • Now we calculate the cosine similarity of Unknown-Image-Embedding-A with all embeddings generated by Model-A. We do the same for all three models and we calculate and sort all the cosine similarity scores and then sort it to get our final prediction.
Code for generating Embeddings after training is in WLLMModel.py create_embeddings_for_personality function¶
  • All generated embedding after training each model is saved inside ../SavedTrainingData/savedmodel/{ModelName}/embeddings folder
In [2]:
from CosinePredictionHelper import CosinePredictionHelper

modelmap = {
    "ModelA": "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-A-12-09-17-01", #Model A trained with first ten personalities in alphabetical order.
    "ModelB": "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-B-12-09-20-24", #Model B trained with fnext 10 
    "ModelC": "../SavedTrainingData/savedmodels/WLLM-Model-Nikhil-C-12-09-22-11" #Model C trained with next 10
}

image_dataset_path = "../dataset"

predictions_save_path = "../predictions"

combinedCosinePredictor = CosinePredictionHelper(models=modelmap, N=5, image_dataset_path=image_dataset_path)
Embedding model created using layer: 'embedding'.
Embedding model created using layer: 'embedding'.
Embedding model created using layer: 'embedding'.
In [3]:
top_average, top_score = combinedCosinePredictor.run_pipeline("../test_images/lail.jpg", predictions_save_path)
../test_images/lail.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 155ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 147ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 159ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 137ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 140ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 138ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 138ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 144ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
*****************************************************
*****************************************************
Average scores: 
{'adriana lima': 0.7552034629978142, 'alex lawther': 0.7291574138313438, 'alexandra daddario': 0.7809922289458795, 'alvaro morte': 0.6710298529170245, 'alycia dabnem carey': 0.8062291955869716, 'amanda crew': 0.8075107267000539, 'amber heard': 0.8003184051134058, 'andy samberg': 0.7455171220479204, 'anne hathaway': 0.7903196199846573, 'anthony mackie': 0.6725007901870158}
Total scores: 
{'adriana lima': 46, 'alex lawther': 31, 'alexandra daddario': 67, 'alvaro morte': 16, 'alycia dabnem carey': 92, 'amanda crew': 90, 'amber heard': 80, 'andy samberg': 44, 'anne hathaway': 70, 'anthony mackie': 14}
**************************************************
Average Scores Sorted 
[('amanda crew', 0.8075107267000539), ('alycia dabnem carey', 0.8062291955869716), ('amber heard', 0.8003184051134058), ('anne hathaway', 0.7903196199846573), ('alexandra daddario', 0.7809922289458795), ('adriana lima', 0.7552034629978142), ('andy samberg', 0.7455171220479204), ('alex lawther', 0.7291574138313438), ('anthony mackie', 0.6725007901870158), ('alvaro morte', 0.6710298529170245)]
Frequency Scores Sorted 
[('alycia dabnem carey', 92), ('amanda crew', 90), ('amber heard', 80), ('anne hathaway', 70), ('alexandra daddario', 67), ('adriana lima', 46), ('andy samberg', 44), ('alex lawther', 31), ('alvaro morte', 16), ('anthony mackie', 14)]
*****************************************************


../test_images/lail.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 144ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 152ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 141ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 134ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 134ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
*****************************************************
*****************************************************
Average scores: 
{'avril lavigne': 0.7982952474906956, 'barack obama': 0.7055718900001599, 'barbara palvin': 0.8141436216290703, 'ben affleck': 0.6555188646902484, 'bill gates': 0.7192064350063573, 'bobby morley': 0.6797838426491514, 'brenton thwaites': 0.7912064754726523, 'brian j. smith': 0.708207934623567, 'brie larson': 0.8394150175369706, 'camila mendes': 0.7588219721635066}
Total scores: 
{'avril lavigne': 79, 'barack obama': 36, 'barbara palvin': 86, 'ben affleck': 10, 'bill gates': 47, 'bobby morley': 21, 'brenton thwaites': 73, 'brian j. smith': 37, 'brie larson': 99, 'camila mendes': 62}
**************************************************
Average Scores Sorted 
[('brie larson', 0.8394150175369706), ('barbara palvin', 0.8141436216290703), ('avril lavigne', 0.7982952474906956), ('brenton thwaites', 0.7912064754726523), ('camila mendes', 0.7588219721635066), ('bill gates', 0.7192064350063573), ('brian j. smith', 0.708207934623567), ('barack obama', 0.7055718900001599), ('bobby morley', 0.6797838426491514), ('ben affleck', 0.6555188646902484)]
Frequency Scores Sorted 
[('brie larson', 99), ('barbara palvin', 86), ('avril lavigne', 79), ('brenton thwaites', 73), ('camila mendes', 62), ('bill gates', 47), ('brian j. smith', 37), ('barack obama', 36), ('bobby morley', 21), ('ben affleck', 10)]
*****************************************************


../test_images/lail.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 140ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 140ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 161ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 144ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 145ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 164ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
*****************************************************
*****************************************************
Average scores: 
{'chris evans': 0.7102440640083643, 'chris hemsworth': 0.75016666041898, 'chris pratt': 0.6963686011032049, 'christian bale': 0.7019059148946654, 'cristiano ronaldo': 0.6831637687403139, 'danielle panabaker': 0.8286191067448486, 'dominic purcell': 0.6718353691328152, 'dwayne johnson': 0.7032357028691432, 'eliza taylor': 0.8010359258531172, 'elizabeth lail': 0.8457364129231838}
Total scores: 
{'chris evans': 52, 'chris hemsworth': 62, 'chris pratt': 36, 'christian bale': 43, 'cristiano ronaldo': 28, 'danielle panabaker': 89, 'dominic purcell': 13, 'dwayne johnson': 46, 'eliza taylor': 81, 'elizabeth lail': 100}
**************************************************
Average Scores Sorted 
[('elizabeth lail', 0.8457364129231838), ('danielle panabaker', 0.8286191067448486), ('eliza taylor', 0.8010359258531172), ('chris hemsworth', 0.75016666041898), ('chris evans', 0.7102440640083643), ('dwayne johnson', 0.7032357028691432), ('christian bale', 0.7019059148946654), ('chris pratt', 0.6963686011032049), ('cristiano ronaldo', 0.6831637687403139), ('dominic purcell', 0.6718353691328152)]
Frequency Scores Sorted 
[('elizabeth lail', 100), ('danielle panabaker', 89), ('eliza taylor', 81), ('chris hemsworth', 62), ('chris evans', 52), ('dwayne johnson', 46), ('christian bale', 43), ('chris pratt', 36), ('cristiano ronaldo', 28), ('dominic purcell', 13)]
*****************************************************


In [4]:
top_average, top_score = combinedCosinePredictor.run_pipeline("../test_images/alvaro.jpg", predictions_save_path)
../test_images/alvaro.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 145ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 141ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 149ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 157ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 150ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 147ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 138ms/step
*****************************************************
*****************************************************
Average scores: 
{'adriana lima': 0.7138186123543341, 'alex lawther': 0.7271488314488156, 'alexandra daddario': 0.6694106970404181, 'alvaro morte': 0.8304647378852088, 'alycia dabnem carey': 0.7260740585151053, 'amanda crew': 0.6980455482500423, 'amber heard': 0.6667116630203677, 'andy samberg': 0.7853243319796406, 'anne hathaway': 0.6682534670805661, 'anthony mackie': 0.7921371315828758}
Total scores: 
{'adriana lima': 53, 'alex lawther': 60, 'alexandra daddario': 21, 'alvaro morte': 99, 'alycia dabnem carey': 64, 'amanda crew': 41, 'amber heard': 21, 'andy samberg': 86, 'anne hathaway': 20, 'anthony mackie': 85}
**************************************************
Average Scores Sorted 
[('alvaro morte', 0.8304647378852088), ('anthony mackie', 0.7921371315828758), ('andy samberg', 0.7853243319796406), ('alex lawther', 0.7271488314488156), ('alycia dabnem carey', 0.7260740585151053), ('adriana lima', 0.7138186123543341), ('amanda crew', 0.6980455482500423), ('alexandra daddario', 0.6694106970404181), ('anne hathaway', 0.6682534670805661), ('amber heard', 0.6667116630203677)]
Frequency Scores Sorted 
[('alvaro morte', 99), ('andy samberg', 86), ('anthony mackie', 85), ('alycia dabnem carey', 64), ('alex lawther', 60), ('adriana lima', 53), ('amanda crew', 41), ('alexandra daddario', 21), ('amber heard', 21), ('anne hathaway', 20)]
*****************************************************


../test_images/alvaro.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 144ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 144ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 148ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 143ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 169ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 141ms/step
*****************************************************
*****************************************************
Average scores: 
{'avril lavigne': 0.6550516486124576, 'barack obama': 0.7610881629676871, 'barbara palvin': 0.6943307861788807, 'ben affleck': 0.8024995985539279, 'bill gates': 0.7592740446343538, 'bobby morley': 0.8043018680525895, 'brenton thwaites': 0.7669554263679743, 'brian j. smith': 0.7562569705153827, 'brie larson': 0.7177858155266847, 'camila mendes': 0.7032858050606837}
Total scores: 
{'avril lavigne': 10, 'barack obama': 63, 'barbara palvin': 23, 'ben affleck': 95, 'bill gates': 65, 'bobby morley': 95, 'brenton thwaites': 69, 'brian j. smith': 63, 'brie larson': 36, 'camila mendes': 31}
**************************************************
Average Scores Sorted 
[('bobby morley', 0.8043018680525895), ('ben affleck', 0.8024995985539279), ('brenton thwaites', 0.7669554263679743), ('barack obama', 0.7610881629676871), ('bill gates', 0.7592740446343538), ('brian j. smith', 0.7562569705153827), ('brie larson', 0.7177858155266847), ('camila mendes', 0.7032858050606837), ('barbara palvin', 0.6943307861788807), ('avril lavigne', 0.6550516486124576)]
Frequency Scores Sorted 
[('ben affleck', 95), ('bobby morley', 95), ('brenton thwaites', 69), ('bill gates', 65), ('barack obama', 63), ('brian j. smith', 63), ('brie larson', 36), ('camila mendes', 31), ('barbara palvin', 23), ('avril lavigne', 10)]
*****************************************************


../test_images/alvaro.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 147ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 134ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 138ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 137ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 139ms/step
*****************************************************
*****************************************************
Average scores: 
{'chris evans': 0.8129509737873967, 'chris hemsworth': 0.789883999219224, 'chris pratt': 0.7966906192232432, 'christian bale': 0.7870220700216634, 'cristiano ronaldo': 0.7830499023296358, 'danielle panabaker': 0.7392215487123732, 'dominic purcell': 0.800850999255398, 'dwayne johnson': 0.7815715672888071, 'eliza taylor': 0.7040024700367011, 'elizabeth lail': 0.7317640174171449}
Total scores: 
{'chris evans': 94, 'chris hemsworth': 66, 'chris pratt': 79, 'christian bale': 60, 'cristiano ronaldo': 54, 'danielle panabaker': 32, 'dominic purcell': 81, 'dwayne johnson': 47, 'eliza taylor': 11, 'elizabeth lail': 26}
**************************************************
Average Scores Sorted 
[('chris evans', 0.8129509737873967), ('dominic purcell', 0.800850999255398), ('chris pratt', 0.7966906192232432), ('chris hemsworth', 0.789883999219224), ('christian bale', 0.7870220700216634), ('cristiano ronaldo', 0.7830499023296358), ('dwayne johnson', 0.7815715672888071), ('danielle panabaker', 0.7392215487123732), ('elizabeth lail', 0.7317640174171449), ('eliza taylor', 0.7040024700367011)]
Frequency Scores Sorted 
[('chris evans', 94), ('dominic purcell', 81), ('chris pratt', 79), ('chris hemsworth', 66), ('christian bale', 60), ('cristiano ronaldo', 54), ('dwayne johnson', 47), ('danielle panabaker', 32), ('elizabeth lail', 26), ('eliza taylor', 11)]
*****************************************************


In [3]:
top_average, top_score = combinedCosinePredictor.run_pipeline("../test_images/sreehari.jpg", predictions_save_path)
../test_images/sreehari.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 3s 3s/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 149ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 150ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 137ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 132ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 131ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
*****************************************************
*****************************************************
Average scores: 
{'adriana lima': 0.6464845691050131, 'alex lawther': 0.7584731672988688, 'alexandra daddario': 0.6643089328741357, 'alvaro morte': 0.7782157352269008, 'alycia dabnem carey': 0.6967903282887404, 'amanda crew': 0.7061726500822362, 'amber heard': 0.6230225440039927, 'andy samberg': 0.7819599009811565, 'anne hathaway': 0.6782800397293341, 'anthony mackie': 0.7718958093086739}
Total scores: 
{'adriana lima': 21, 'alex lawther': 73, 'alexandra daddario': 31, 'alvaro morte': 91, 'alycia dabnem carey': 53, 'amanda crew': 55, 'amber heard': 12, 'andy samberg': 92, 'anne hathaway': 39, 'anthony mackie': 83}
**************************************************
Average Scores Sorted 
[('andy samberg', 0.7819599009811565), ('alvaro morte', 0.7782157352269008), ('anthony mackie', 0.7718958093086739), ('alex lawther', 0.7584731672988688), ('amanda crew', 0.7061726500822362), ('alycia dabnem carey', 0.6967903282887404), ('anne hathaway', 0.6782800397293341), ('alexandra daddario', 0.6643089328741357), ('adriana lima', 0.6464845691050131), ('amber heard', 0.6230225440039927)]
Frequency Scores Sorted 
[('andy samberg', 92), ('alvaro morte', 91), ('anthony mackie', 83), ('alex lawther', 73), ('amanda crew', 55), ('alycia dabnem carey', 53), ('anne hathaway', 39), ('alexandra daddario', 31), ('adriana lima', 21), ('amber heard', 12)]
*****************************************************


../test_images/sreehari.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 3s 3s/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 153ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 145ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 147ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 148ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 147ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 155ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 165ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 141ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 132ms/step
*****************************************************
*****************************************************
Average scores: 
{'avril lavigne': 0.6588976332096701, 'barack obama': 0.7843169072676662, 'barbara palvin': 0.6787097841372003, 'ben affleck': 0.801008746411525, 'bill gates': 0.7384742151623434, 'bobby morley': 0.7883663236450413, 'brenton thwaites': 0.7645397771087815, 'brian j. smith': 0.7876602844767159, 'brie larson': 0.6951760204836428, 'camila mendes': 0.6425233320751266}
Total scores: 
{'avril lavigne': 18, 'barack obama': 77, 'barbara palvin': 31, 'ben affleck': 92, 'bill gates': 50, 'bobby morley': 84, 'brenton thwaites': 65, 'brian j. smith': 82, 'brie larson': 37, 'camila mendes': 14}
**************************************************
Average Scores Sorted 
[('ben affleck', 0.801008746411525), ('bobby morley', 0.7883663236450413), ('brian j. smith', 0.7876602844767159), ('barack obama', 0.7843169072676662), ('brenton thwaites', 0.7645397771087815), ('bill gates', 0.7384742151623434), ('brie larson', 0.6951760204836428), ('barbara palvin', 0.6787097841372003), ('avril lavigne', 0.6588976332096701), ('camila mendes', 0.6425233320751266)]
Frequency Scores Sorted 
[('ben affleck', 92), ('bobby morley', 84), ('brian j. smith', 82), ('barack obama', 77), ('brenton thwaites', 65), ('bill gates', 50), ('brie larson', 37), ('barbara palvin', 31), ('avril lavigne', 18), ('camila mendes', 14)]
*****************************************************


../test_images/sreehari.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 2s 2s/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 132ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 166ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 158ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 142ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 153ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 137ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 154ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 138ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 124ms/step
*****************************************************
*****************************************************
Average scores: 
{'chris evans': 0.8029547805821805, 'chris hemsworth': 0.7566127451244979, 'chris pratt': 0.7809077954243951, 'christian bale': 0.7475244210982548, 'cristiano ronaldo': 0.7756844375792936, 'danielle panabaker': 0.7215544321141862, 'dominic purcell': 0.7868324608506905, 'dwayne johnson': 0.7371024466346089, 'eliza taylor': 0.6906395878236402, 'elizabeth lail': 0.6917106245251968}
Total scores: 
{'chris evans': 95, 'chris hemsworth': 60, 'chris pratt': 81, 'christian bale': 48, 'cristiano ronaldo': 75, 'danielle panabaker': 34, 'dominic purcell': 85, 'dwayne johnson': 40, 'eliza taylor': 15, 'elizabeth lail': 17}
**************************************************
Average Scores Sorted 
[('chris evans', 0.8029547805821805), ('dominic purcell', 0.7868324608506905), ('chris pratt', 0.7809077954243951), ('cristiano ronaldo', 0.7756844375792936), ('chris hemsworth', 0.7566127451244979), ('christian bale', 0.7475244210982548), ('dwayne johnson', 0.7371024466346089), ('danielle panabaker', 0.7215544321141862), ('elizabeth lail', 0.6917106245251968), ('eliza taylor', 0.6906395878236402)]
Frequency Scores Sorted 
[('chris evans', 95), ('dominic purcell', 85), ('chris pratt', 81), ('cristiano ronaldo', 75), ('chris hemsworth', 60), ('christian bale', 48), ('dwayne johnson', 40), ('danielle panabaker', 34), ('elizabeth lail', 17), ('eliza taylor', 15)]
*****************************************************


In [4]:
top_average, top_score = combinedCosinePredictor.run_pipeline("../test_images/nikhil.jpg", predictions_save_path)
../test_images/nikhil.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 197ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 145ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 160ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 163ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 141ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 157ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 147ms/step
*****************************************************
*****************************************************
Average scores: 
{'adriana lima': 0.6373770698025024, 'alex lawther': 0.7173183755388431, 'alexandra daddario': 0.617166898138498, 'alvaro morte': 0.7443825179164236, 'alycia dabnem carey': 0.6518436915728826, 'amanda crew': 0.6306253974117697, 'amber heard': 0.613804717314867, 'andy samberg': 0.6919717722241882, 'anne hathaway': 0.6130849640213506, 'anthony mackie': 0.7102457581003344}
Total scores: 
{'adriana lima': 44, 'alex lawther': 88, 'alexandra daddario': 24, 'alvaro morte': 97, 'alycia dabnem carey': 59, 'amanda crew': 41, 'amber heard': 22, 'andy samberg': 74, 'anne hathaway': 20, 'anthony mackie': 81}
**************************************************
Average Scores Sorted 
[('alvaro morte', 0.7443825179164236), ('alex lawther', 0.7173183755388431), ('anthony mackie', 0.7102457581003344), ('andy samberg', 0.6919717722241882), ('alycia dabnem carey', 0.6518436915728826), ('adriana lima', 0.6373770698025024), ('amanda crew', 0.6306253974117697), ('alexandra daddario', 0.617166898138498), ('amber heard', 0.613804717314867), ('anne hathaway', 0.6130849640213506)]
Frequency Scores Sorted 
[('alvaro morte', 97), ('alex lawther', 88), ('anthony mackie', 81), ('andy samberg', 74), ('alycia dabnem carey', 59), ('adriana lima', 44), ('amanda crew', 41), ('alexandra daddario', 24), ('amber heard', 22), ('anne hathaway', 20)]
*****************************************************


../test_images/nikhil.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 153ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 137ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 131ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 149ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 152ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 153ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 158ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 146ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 156ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 151ms/step
*****************************************************
*****************************************************
Average scores: 
{'avril lavigne': 0.5914019411362856, 'barack obama': 0.652860486208307, 'barbara palvin': 0.6286232985035607, 'ben affleck': 0.7308583024218456, 'bill gates': 0.6470643041051096, 'bobby morley': 0.7661884285373283, 'brenton thwaites': 0.6956500025111925, 'brian j. smith': 0.6604570837160469, 'brie larson': 0.6453119040916272, 'camila mendes': 0.6222438890417109}
Total scores: 
{'avril lavigne': 10, 'barack obama': 55, 'barbara palvin': 33, 'ben affleck': 93, 'bill gates': 47, 'bobby morley': 97, 'brenton thwaites': 79, 'brian j. smith': 63, 'brie larson': 47, 'camila mendes': 26}
**************************************************
Average Scores Sorted 
[('bobby morley', 0.7661884285373283), ('ben affleck', 0.7308583024218456), ('brenton thwaites', 0.6956500025111925), ('brian j. smith', 0.6604570837160469), ('barack obama', 0.652860486208307), ('bill gates', 0.6470643041051096), ('brie larson', 0.6453119040916272), ('barbara palvin', 0.6286232985035607), ('camila mendes', 0.6222438890417109), ('avril lavigne', 0.5914019411362856)]
Frequency Scores Sorted 
[('bobby morley', 97), ('ben affleck', 93), ('brenton thwaites', 79), ('brian j. smith', 63), ('barack obama', 55), ('bill gates', 47), ('brie larson', 47), ('barbara palvin', 33), ('camila mendes', 26), ('avril lavigne', 10)]
*****************************************************


../test_images/nikhil.jpg
Total Remaining : 10
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 148ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 125ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 126ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 130ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 136ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 135ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 134ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 131ms/step
*****************************************************
*****************************************************
Average scores: 
{'chris evans': 0.726281781831082, 'chris hemsworth': 0.6942807134388362, 'chris pratt': 0.7185822028521882, 'christian bale': 0.7108857690420141, 'cristiano ronaldo': 0.6852512982873885, 'danielle panabaker': 0.6271865754214808, 'dominic purcell': 0.6873146799000256, 'dwayne johnson': 0.6853692542743464, 'eliza taylor': 0.6515792482996364, 'elizabeth lail': 0.6535641010655902}
Total scores: 
{'chris evans': 95, 'chris hemsworth': 63, 'chris pratt': 88, 'christian bale': 82, 'cristiano ronaldo': 53, 'danielle panabaker': 12, 'dominic purcell': 55, 'dwayne johnson': 50, 'eliza taylor': 27, 'elizabeth lail': 25}
**************************************************
Average Scores Sorted 
[('chris evans', 0.726281781831082), ('chris pratt', 0.7185822028521882), ('christian bale', 0.7108857690420141), ('chris hemsworth', 0.6942807134388362), ('dominic purcell', 0.6873146799000256), ('dwayne johnson', 0.6853692542743464), ('cristiano ronaldo', 0.6852512982873885), ('elizabeth lail', 0.6535641010655902), ('eliza taylor', 0.6515792482996364), ('danielle panabaker', 0.6271865754214808)]
Frequency Scores Sorted 
[('chris evans', 95), ('chris pratt', 88), ('christian bale', 82), ('chris hemsworth', 63), ('dominic purcell', 55), ('cristiano ronaldo', 53), ('dwayne johnson', 50), ('eliza taylor', 27), ('elizabeth lail', 25), ('danielle panabaker', 12)]
*****************************************************


Summary¶

  • Overall accuracy on test images is around ~75% on the larger dataset trained model.
  • On subset models it is around ~87%.
  • Time taken for training 52 personalities was ~8 hours
  • Time taken to train subset model was around 1 hour.
  • For critical recognition tasks we have to rely on training the entire dataset.
  • Time taken for prediction using classification model is around 200milliseconds only while that for the subset based model it is around 7.5 seconds. This can be reduced to 2.5 seconds in a 16GB machine if we implement concurrent calculations for cosine similarity.
  • For our use case subset model based architecture will be better in calculating similarity scores as well as easier for adding new faces into the model pipeline.

Additional Tweak¶

  • In order to increase the accuracy we have applied one more logic on top.
  • Instead of creating the embedding once per unknown image per model we do augmentation on the unknown image also for 10 iterations and calculate cosine similarity. This way it was seen to improve the predictions.
  • Due to time constraints we were not able to capture the metric for these improvements.But we are working on it.

Further Improvements¶

Dataset Improvements¶
  • We have noticed that the incorrectly predicted images are rooting from a key issue in the dataset which can be corrected. We not only have straight profile photos of celebrities instead we have a mix of different angles and side profiles also.
  • We can clean and use only images which are classic portrait images to solve this issue. This is especially useful for us since we will be asking users for a similar image.
  • Another model to detect images deviating from a classic portrait can be developed and used in the pipeline.
  • The next major improvement we can have is include personalities with stark facial features. For example in the last predicted outputs we can see that most of the celebrity features are similar.
  • The dataset includes only hollywood celebrities which makes it useless for other ethnic groups.
  • We should extend this dataset to include familiar faces from all ethnic groups and nations.
  • In the quest for using as much data possible for each we havent undersampled images for any personalities. After inspecting the result we know that there are some celebrities with more than 200 images and some with 80. This difference should be corrected to make the model more democratic.
  • Also we should get rid of images that are low resolution as the model expects 299x 299 images as standard input all other images can be stripped of and replaced.
Model Improvements¶
  • We haven't adjusted the learning rate and we havent experimented with unfreezing lesser and more layers from the base which can result in better accuracy.
  • We can also use more layers on top and experiment with changing the layer we take out as embedding. Maybe a bigger shape than (0,2048) or lesser might be best for our use case.
  • We have used VGG16 and tried getting facenet implementation which was difficult to get hands on as well during the start of the project. We can explore more on that.
  • We have other models like Resnet from Microsoft to try and increase accuracy.
  • To increase the speed of prediction of the Embedding based model we can store the embedding file with an average of all training images instead of storing all of them in the embedding. This will drastically reduce the time taken for prediction.